doi: 10.17586/2226-1494-2024-24-1-101-111


Deep attention based Proto-oncogene prediction and Oncogene transition possibility detection using moments and position based amino acid features

M. Vijayalakshmi, M. Vallinayagi


Read the full article  ';
Article in English

For citation:
Vijayalakshmi M., Vallinayagi M. Deep attention based Proto-oncogene prediction and Oncogene transition possibility detection using moments and position based amino acid features. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 1, pp. 101–111. doi: 10.17586/2226-1494-2024-24-1-101-111


Abstract
The loss of the regulatory function of tumor suppression genes and mutations in Proto-oncogene are the common underlying mechanisms for uncontrolled tumor growth in the varied complex of disorders known as cancer. Oncogene can be curable by means of diagnosing and treating the possibilities of Proto-oncogene at earlier stages. Recently, machine learning approaches helps to focus and provide information about the possibilities of Proto-oncogene that may change into oncogene in different cancer types. This study helps to diagnose the possibilities of Proto-oncogene which are possible to change oncogenes at earlier stage. Thus, this present study proposed an efficient unique predictor of Proto- oncogene with the help of Bi-Directional Long Short Term Memory added with attention concept. This approach also find the probability of Proto-oncogene to oncogene using statistical moments, position based amino-acid composition representation and deep features extracted from the sequence. Consequently, this study suggests that using a K-Nearest Neighbor classifier it is possible to find probability of changing from Proto-oncogene to cancerous oncogene.

Keywords: Proto-oncogene, PseAAC, prediction, tumour suppression genes, TSG, machine learning, Bi-directional Long Short Term Memory (BiLSTM)

Acknowledgements. Special thanks to Dr. L. Rajagopala Marthandam, HOD of Medicine, TVMCH, India for his encouragement and support.

References
  1. Williams D.E., Eisenman J., Baird A., Rauch C., Van Ness K., March C.J., Park L.S., Martin U., Mochizukl D.Y., Boswell H.S., Burgess G.S., Cosman D., Lyman S.D. Identification of a ligand for the c-kit Proto-oncogene. Cell, 1990, vol. 63, no. 1, pp. 167–174. https://doi.org/10.1016/0092-8674(90)90297-r
  2. Cooper G.M. Oncogenes. 2nd ed. Jones and Bartlett Publishers Inc. Boston, 1995, 384 p.
  3. Mulligan L.M., Kwok J.B., Healey C.S., Elsdon M.J., Eng C., Gardner E., Love D.R., Mole S.E., Moore J.K., Papi L., Ponder M.A., Telenius H., Tunnacliffe A., Ponder B.A. Germ-line mutations of the RET Proto-oncogene in multiple endocrine neoplasia type 2A. Nature, 1993, vol. 363, no. 6428, pp. 458–460. https://doi.org/10.1038/363458a0
  4. Croce C.M. Oncogenes and cancer. New England journal of medicine, 2008, vol. 358, no. 5, pp. 502–511. https://doi.org/10.1056/NEJMra072367
  5. Vogelstein B., Papadopoulos N., Velculescu V.E., Diaz L.A., Kinzler K.W. Cancer genome landscapes. Science, 2013, vol. 339, no. 6127, pp. 1546–1558. https://doi.org/10.1126/science.1235122
  6. Pon J.R., Marra M.A. Driver and passenger mutations in cancer. Annual Review of Pathology: Mechanisms of Disease, 2015, vol. 10, pp. 25–50. https://doi.org/10.1146/annurev-pathol-012414-040312
  7. Kulmanov M., Khan M.A., Hoehndorf R. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics, 2018, vol. 34, no. 4, pp. 660–668. https://doi.org/10.1093/bioinformatics/btx624
  8. Wass M.N., Sternberg M.J. ConFunc–functional annotation in the twilight zone. Bioinformatics, 2008, vol. 24, no. 6, pp. 798–806. https://doi.org/10.1093/bioinformatics/btn037
  9. Deng M., Zhang K., Mehta S., Chen T., Sun F. Prediction of protein function using protein-protein interaction data. Journal of Computational Biology, 2003, vol. 10, no. 6, pp. 947–960. https://doi.org/10.1089/106652703322756168
  10. Marcotte E.M., Pellegrini M., Ng H.L., Rice D.W., Yeates T.O., Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science, 1999, vol. 285, no. 5428, pp. 751–753. https://doi.org/10.1126/science.285.5428.751
  11. Pal D., Eisenberg D. Inference of protein function from protein structure. Structure, 2005, vol. 13, no. 1, pp. 121–130. https://doi.org/10.1016/j.str.2004.10.015
  12. Huttenhower C., Hibbs M., Myers C., Troyanskaya O.G. A scalable method for integration and functional analysis of multiple microarray datasets. Bioinformatics, 2006, vol. 22, no. 23, pp. 2890–2897. https://doi.org/10.1093/bioinformatics/btl492
  13. Kourmpetis Y.A.I., van Dijk A.D.J., Bink M.C.A., van Ham M. R.C.H.J., terBraak C.J.F. Bayesian markov random field analysis for protein function prediction based on network data. PLoS One, 2010, vol. 5, no. 2. https://doi.org/10.1371/journal.pone.0009293
  14. Radivojac P., Clark W.T., Oron T.R. et al. A large-scale evaluation of computational protein function prediction. Nature Methods, 2013, vol. 10, no. 3, pp. 221–227. https://doi.org/10.1038/nmeth.2340
  15. Mihaylov I., Nisheva M., Vassilev D. Application of machine learning models for survival prognosis in breast cancer studies. Information, 2019, vol. 10, no. 3, pp. 93. https://doi.org/10.3390/info10030093
  16. Cruz J.A., Wishart D.S. Applications of machine learning in cancer prediction and prognosis. Cancer Informatics, 2006, vol. 2, pp. 59–77. https://doi.org/10.1177/117693510600200030
  17. Sotiriou C., Neo S.-Y., McShane L.M., Korn E.L., Long P.M., Jazaeri A., Martiat P., Fox S.B., Harris A.L., Liu E.T. Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proceedings of the National Academy of Sciences of the United States of America, 2003, vol. 100, no. 18, pp. 10393–10398. https://doi.org/10.1073/pnas.1732912100
  18. Vural S., Wang X., Guda C. Classification of breast cancer patients using somatic mutation profiles and machine learning approaches. BMC Systems Biology, 2016, vol. 10, no. 3, pp. 62. https://doi.org/10.1186/s12918-016-0306-z
  19. Cai Z., Xu D., Zhang Q., Zhang J., Ngai S.-M., Shao J. Classification of lung cancer using ensemble-based feature selection and machine learning methods. Molecular BioSystems, 2015, vol. 11, no. 3, pp. 791–800. https://doi.org/10.1039/c4mb00659c
  20. Kourou K., Exarchos T.P., Exarchos K.P., Karamouzis M.V. Fotiadis D.I. Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, 2015, vol. 13, pp. 8–17. https://doi.org/10.1016/j.csbj.2014.11.005
  21. Khan Y.D., Batool A., Rasool N., Khan S.A., Chou K.-C.J. Prediction of nitrosocysteine sites using position and composition variant features. Letters in Organic Chemistry, 2019, vol. 16, no. 4, pp. 283–293. https://doi.org/10.2174/1570178615666180802122953
  22. Malebary S.J., Khan R., Khan Y.D. ProtoPred: Advancing oncological research through identification of proto-oncogene proteins. IEEE Access, 2021, vol. 9, pp. 68788–68797. https://doi.org/10.1109/ACCESS.2021.3076448
  23. Mahmood M.K., Ehsan A., Khan Y.D., Chou K.-C. iHyd-LysSite (EPSV): identifying hydroxylysine sites in protein using statistical formulation by extracting enhanced position and sequence variant feature technique. Current Genomic, 2020, vol. 21, no. 7, pp. 536–545. https://doi.org/10.2174/1389202921999200831142629
  24. Kumar P., Henikoff S., Ng P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols, 2009, vol. 4, no. 7, pp. 1073–1081. https://doi.org/10.1038/nprot.2009.86
  25. Vaser R., Adusumalli S., Leng S., Sikic M., Ng P.C. SIFT missense predictions for genomes. Nature Protocols, 2016, vol. 11, no. 1, pp. 1–9. https://doi.org/10.1038/nprot.2015.123
  26. Yang Y., Lu B.L., Yang W.Y. Classification of protein sequences based on word segmentation methods. Proc. of the 6th Asia-Pacific Bioinformatics Conference (APBC ’08), 2008, pp. 177–186. https://doi.org/10.1142/9781848161092_0020
  27. Ali F., Hayat M. Classification of membrane protein types using Voting Feature Interval in combination with Chou׳s Pseudo Amino Acid Composition. Journal of Theoretical Biology. 2015, vol. 384, pp. 78–83. https://doi.org/10.1016/j.jtbi.2015.07.034
  28. Allehaibi K., Daanial Khan Y., Khan S.A. iTAGPred: A two-level prediction model for identification of angiogenesis and tumor angiogenesis biomarkers. Applied Bionics and Biomechanics, 2021, vol. 2021, pp. 2803147. https://doi.org/10.1155/2021/2803147
  29. Lyu J., Li J.J., Su J., Peng F., Chen Y.E., Ge X., Li W. DORGE: Discovery of Oncogenes and tumoR suppressor genes using Genetic and Epigenetic features. Science Advances, 2020, vol. 6, no. 46, pp. 1–17. https://doi.org/10.1126/sciadv.aba6784
  30. Feng P., Yang H., Ding H., Lin H., Chen W., Chou K.C. iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics, 2018, vol. 111, no. 1, pp. 96–102. https://doi.org/10.1016/j.ygeno.2018.01.005
  31. Huang C.H., Peng H.S., Ng K.L. Prediction of cancer proteins by integrating protein interaction, domain frequency, and domain interaction data using machine learning algorithms. BioMed Research International, 2015, vol. 2015, pp. 312047. https://doi.org/10.1155/2015/312047
  32. Rahman M.S., Shatabda S., Saha S., Kaykobad M., Rahman M.S. DPP-PseAAC: a DNA-binding protein prediction model using Chou’s general PseAAC. Journal of Theoretical Biology, 2018, vol. 452, pp. 22–34. https://doi.org/10.1016/j.jtbi.2018.05.006
  33. Chowdhury S.Y., Shatabda S., Dehzangi A. iDNAProt-ES: Identification of DNA-binding proteins using evolutionary and structural features. Scientific Reports, 2017, vol. 7, pp. 14938. https://doi.org/10.1038/s41598-017-14945-1
  34. Kumar R.D., Searleman A.C., Swamidass S.J., Griffith O.L., Bose R. Statistically identifying tumor suppressors and oncogenes from pan-cancer genome-sequencing data. Bioinformatics, 2015, vol. 31, no. 22, pp. 3561–3568. https://doi.org/10.1093/bioinformatics/btv430
  35. Akmal M.A., Hussain W., Rasool N., Khan Y.D., Khan S.A., Chou K.-C. Using CHOU'S 5-steps rule to predict O-linked serine glycosylation sites by blending position relative features and statistical moment. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2021, vol. 18, no. 5, pp. 2045–2056. https://doi.org/10.1109/TCBB.2020.2968441
  36. Khan Y.D., Ahmad F., Anwar M.W. Aneuro-cognitive approach for iris recognition using back propagation. World Applied Sciences Journal, 2012, vol. 16, no. 5, pp. 678–685.
  37. Khan Y.D., Ahmed F., Khan S.A. Situation recognition using image moments and recurrent neural networks. Neural Computing and Applications, 2014, vol. 24, no. 7–8, pp. 1519–1529. https://doi.org/10.1007/s00521-013-1372-4
  38. Khan Y.D., Khan N.S., Farooq S., Abid A., Khan S.A., Ahmad F., Mahmood M.K. An efficient algorithm for recognition of human actions. Scientific World Journal, 2014, vol. 2014, pp. 875879. https://doi.org/10.1155/2014/875879
  39. Khan Y.D., Khan S.A., Ahmad F., Islam S. Iris recognition using image moments and K-means algorithm. Scientific World Journal, 2014, vol. 2014, pp. 723595. https://doi.org/10.1155/2014/723595
  40. Mahmood S., Khan Y.D., Mahmood M.K. A treatise to vision enhancement and color fusion techniques in night vision devices. Multimedia Tools and Applications, 2018, vol. 77, no. 2, pp. 2689–2737. https://doi.org/10.1007/s11042-017-4365-y
  41. Butt H., Rasool N., Khan Y.D. A treatise to computational approaches towards prediction of membrane protein and its subtypes. The Journal of Membrane Biology, 2017, vol. 250, no. 1, pp. 55–76. https://doi.org/10.1007/s00232-016-9937-7
  42. Akmal M.A., Rasool N., Khan Y.D. Prediction of N-linked glycosylation sites using position relative features and statistical moments. PLoS ONE, 2017, vol. 12, no. 8, pp. 1–21. https://doi.org/10.1371/journal.pone.0181966
  43. Pundir S., Magrane M., Martin M.J., O’Donovan C. Searching and navigating UniProt databases. Current Protocols in Bioinformatics, 2015. pp. 1.27.1–1.27.10 https://doi.org/10.1002/0471250953.bi0127s50
  44. Delorenzi M., Speed T. An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics, 2002, vol. 18, no. 4, pp. 617–625. https://doi.org/10.1093/bioinformatics/18.4.617
  45. Jia J., Liu Z., Xiao X., Liu B., Chou K.-C. iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Analytical Biochemistry, 2016, vol. 497, pp. 48–56. https://doi.org/10.1016/j.ab.2015.12.009


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2024 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика